August 22, 2018

Outline

What we'll cover today

  • Intro & general tips
  • R performance tips: patterns to use and avoid
  • Vectors & Matrices
  • Memory management, large tables
  • Profiling and Benchmarking
  • Loops

General advice

  • If you don't understand something, try some experiments
  • Browse the documentation, learn its jargon and quirks
  • Break your code into functions when appropriate
  • Use functions to reduce the need for global variables
  • Write tests for your functions
  • Use git to keep track of changes
  • Learn to distribute your code as a package

Is R slow?

Is R slow?

Sometimes, but well written R programs are usually fast enough.

  • Designed to make programming easier
    • Speed was not the primary design criteria
  • Slow programs often a result of bad programming practices or not understanding how R works
  • There are various options for calling C or C++ functions from R

R performance before you start

Premature optimization is the root of all evil – Donald Knuth

  • Become familiar with R's vector and apply functions
  • Consider specialized performance packages
  • E.g. data.table, bigmemory, plyr, RSQLite, snow, multicore, parallel
  • Don't use an R GUI when performance is important

R tuning advice

  • Be methodical but don't get carried away with micro-optimizations
  • Use monitoring tools such as top, Activity Monitor, Task Manage
  • Use vector functions
  • Avoid duplication of objects
  • Pre-allocate result vectors
  • Profile your code and run benchmarks
  • Byte-compile with cmpfun, or call a compiled language (e.g. C, C++)

Vectors & Matrices

Vectors are central to good R programming

  • Fast, since implemented as a single C or Fortran function
  • Concise and easy to read
  • Can often replace for loops
  • However, heavy use can result in high memory usage

Useful vector functions

  • math operators: +, -, *, /, ^, %/%, %%
  • math functions: abs, sqrt, exp, log, log10, cos, sin, tan, sum, prod
  • logical operators: &, |, !
  • relational operators: ==, !=, <, >, <=, >=
  • string functions: nchar, tolower, toupper, grep, sub, gsub, strsplit
  • conditional function: ifelse (pure R code)
  • misc: which, which.min, which.max, pmax, pmin, is.na, any, all, rnorm, runif, sprintf, rev, paste, as.integer, as.character

Dynamic features of vectors

initialize x and fill it with zeros

n <- 10
x <- double(n)
x
 [1] 0 0 0 0 0 0 0 0 0 0

Extend x by assignment

x[15] <- 100
x
 [1]   0   0   0   0   0   0   0   0   0   0  NA  NA  NA  NA 100

Dynamic features of vectors

Resize/truncate x

length(x) <- 5
x
[1] 0 0 0 0 0

rnorm vector function

x <- rnorm(10)
x
 [1] -0.15836239 -1.43148503 -0.98957681 -1.05556511 -0.05897849
 [6]  0.32558374  0.39072691  0.17596137  0.38024952 -1.08059739

Vector indexing

Extract subvector

x[3:6]
[1] -0.98957681 -1.05556511 -0.05897849  0.32558374

Extract elements using result of vector relational operation

x[x > 0]
[1] 0.3255837 0.3907269 0.1759614 0.3802495

Vector indexing

You can also use an index to assign values

x[is.na(x)] <- 0

Matrix indexing

Make a new matrix

m <- matrix(rnorm(100), 10, 10)

Extract 2X3 submatrix (non-consecutive columns)

m[3:4, c(5,7,9)]
          [,1]      [,2]      [,3]
[1,] -1.194728  0.184040  1.125859
[2,]  0.558309 -0.212854 -1.513369

Matrix indexing

Extract arbitrary elements as vector

m[cbind(3:6, c(2,4,6,9))]
[1] -0.00151433  0.42465008 -0.24771947  0.87036074

Extract elements using result of vector relational operation

head(m[m > 0])
[1] 0.8362644 0.7029953 0.5499310 0.4274378 0.6290166 0.2998121

Matrix indexing

You can also use a matrix index to assign values

m[is.na(m)] <- 0

Memory Considerations

Memory in R

  • Avoid duplicating objects, especially big ones or those in loops
  • Look into memory effecient libraries
  • Look into other formats to store data

Beware of object duplication


  • R uses pass by value semantics for function arguments
  • In general, this requires making copies of objects
    • Functions must return modified object
  • R tries to avoid copying unless necessary

Example of object duplication

tracemem reports when an object is duplicated, which is very useful for debugging performance problems.

In this example, object duplication is expected and helpful.

x <- double(10)
tracemem(x)
[1] "<0x5642bee7d5f8>"
y <- x
y[1] <- 10
tracemem[0x5642bee7d5f8 -> 0x5642bec45c08]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> 

Example of object duplication

.Internal(inspect(x))
@5642bee7d5f8 14 REALSXP g0c5 [NAM(2),TR] (len=10, tl=0) 0,0,0,0,0,...
.Internal(inspect(y))
@5642bec45c08 14 REALSXP g0c5 [NAM(1),TR] (len=10, tl=0) 10,0,0,0,0,...
x[1] <- 50
tracemem[0x5642bee7d5f8 -> 0x5642bebfb3d8]: eval eval withVisible withCallingHandlers handle timing_fn evaluate_call <Anonymous> evaluate in_dir block_exec call_block process_group.block process_group withCallingHandlers process_file <Anonymous> <Anonymous> 

Example of unexpected object duplication

Passing matrix to non-primitive function such as nrow poisons for duplication

> m <- matrix(0, 3, 3)
> tracemem(m)
[1] "<0x7fc168d29df0>"
> m[1,1] <- 1
> nrow(m)
[1] 3
> m[1,1] <- 2
tracemem[0x7fc168d29df0 -> 0x7fc168d21f58]:

Splitting problem into smaller tasks

R makes it easy to read entire data sets in one operation, but reading it in parts can be much more efficient.

  • Splitting the problem into smaller tasks is compatible with parallel computing techniques
  • The foreach & iterators packages provide tools to split inputs into smaller pieces
  • Use Linux commands (split, awk, etc) or other languages (e.g. Python to preprocess
    • Split data files and remove unneeded fields

Beware of read.table

The read.table function is commonly used for reading data files, but it can be very slow on large files

  • Use of the colClasses argument can improve performance
  • colClasses can be used to skip a column, using less memory
  • It can be faster to read a file in smaller chunks using the nrows argument
  • The scan function can be faster
  • Consider using similar functions from different packages, such as data.table, sqldf, and bigmemory

Good alternative libraries

bigmemory

  • Defines mutable matrix objects that aren't automatically duplicated
  • works very well in conjunction with parallel computing

big.matrix

  • can use a backing file that is memory mapped

biganalytics

  • apply, biglm, bigglm, bigkmeans, colmax

bigtabulate

  • bigsplit, bigtabulate, bigtable, bigtsummary

Save data in binary format

Saving data in a binary format can make it much faster to read the data later. There are a variety of functions available to do that:

  • save/load
  • writeBin/readBin
  • write.big.matrix/read.big.matrix (from bigmemory)

SQLite in R

Consider putting data into an SQLite database.

  • RSQLite package is easy to use
  • Easy to get subsets of the data into a data frame
  • Command line tool very useful for experimenting with queries
  • Database can be accessed from many different languages
  • The sqldf package may be useful also
  • Can be quite slow

Profiling and Benchmarking

Find your marble, then start chiseling

Profiling vs Benchmarking

Profiling

  • If you've decided you need your code to perform better, profile first
  • Profiling helps isolate hot spots
  • Time spent here will likely yeild best ROI

Benchmarking

  • With hot spots in hand, examine the code and propose alternatives
  • While ensuring the reesults are the same, ask which performs best

Profiling

R profiling tools

R has builtin support for profiling, but there are additional

packages available:

  • proftools
  • profvis (RStudio integration)

Basic profiling with proftools

f <- function(a) { g1(a) + g2(2 * a) }

g1 <- function(a) { h(a) }

g2 <- function(a) { sqrt(a) }

h <- function(a) {
  b <- double(length(a))
  for (i in seq_along(a)) {
    b[i] <- sqrt(a[i])
  }
  b
}

Basic profiling with proftools

x <- 1:1000000
Rprof('prof.out')
for (i in 1:10) {
  y <- f(x)
}
Rprof(NULL)
summaryRprof("prof.out")$by.self
         self.time self.pct total.time total.pct
"h"           0.76    55.07       0.82     59.42
"g2"          0.44    31.88       0.54     39.13
"*"           0.06     4.35       0.06      4.35
"double"      0.06     4.35       0.06      4.35
"sqrt"        0.04     2.90       0.04      2.90
"+"           0.02     1.45       0.02      1.45

Basic profiling with profvis

Can also do this in RStudio, e.g. Profile -> Start Profile

profvis({
for (i in 1:10) {
  y <- f(x)
}
})

Benchmarking

Knowing where code is slow via profiling, use benchmarking tools

  • Put problem code into a functions
  • Benchmark different versions of code for comparison
  • system.time is useful for long running code
  • microbenchmark package is useful for analyzing short running code

Loops

Wherefore art thou performant… or not?

Are for loops in R slow?

  • Not all for loops are bad
  • Most common mistakes involve for loops.
  • The classic mistake is not preallocating a result vector.

Example 1

Create a vector of length n where all values are x

Example 1: a bad for loop

bad.for <- function(n,x) {
  result <- NULL
  for (i in 1:n) {
    result[i] <- x
  }
  result
}
  • Large number of iterations
  • Tiny amount of computation per iteration
  • Item result vector is reallocated and copied on each iteration
  • Triggering garbage collection periodically

Example 1: a better for loop

okay.for <- function(n,x) {
  result <- double(n)
  for (i in 1:n) {
    result[i] <- x
  }
  result
}

Improvement over the previous example, but it's still slow because of the many tiny iterations.

Example 1: a puzzle loop

strange.for <- function(n, x) {
  result <- NULL
  for (i in n:1) {
    result[i] <- x
  }
  result
}

Is this loop faster or slower than the previous two?

Example 1: using a vector function

# use of vector assignment
vector.assn <- function(n, x) {
  result <- double(n)
  result[] <- x
  result
}

We can also use vector assignment

Example 1: using R built-in function

Example 1: testing

Make sure functions produce identical output

n <- 10000
x <- 7
bad.result        <- bad.for(n, x)
okay.result       <- okay.for(n, x)
strange.result    <- strange.for(n, x)
vector.result     <- vector.assn(n, x)
built.result      <- built.in(n, x)
c(identical(bad.result, okay.result),
identical(bad.result, strange.result),
identical(bad.result, vector.result),
identical(bad.result, built.result))
[1] TRUE TRUE TRUE TRUE

Example 1: benchmark results

res <- microbenchmark(bad=bad.for(n,x), okay=okay.for(n,x), strange=strange.for(n,x),
                      vector=vector.assn(n,x), builtin=built.in(n,x))
kable(summary(res, unit="relative"))
expr min lq mean median uq max neval
bad 190.440468 185.358769 97.97214 183.218901 155.579326 5.8206493 100
okay 38.552979 36.414985 18.26398 36.102501 28.842267 0.8006240 100
strange 39.146184 37.737766 18.26245 36.930750 28.729188 0.7198776 100
vector 1.721545 1.712761 1.64357 1.731775 1.592343 1.9128427 100
builtin 1.000000 1.000000 1.00000 1.000000 1.000000 1.0000000 100

Example 1: benchmark plot

autoplot(res)

Example 2

Create a matrix with n rows and x columns

Each value in the matrix is sampled from normal distribution, \(\mu=0 , \sigma=1\)

Example 2: another bad for loop

Generate a matrix of values sampled from normal distribution n rows, x columns

bad.norm <- function(n,x) {
  m <- NULL
  for (i in 1:n) {
    m <- rbind(m, rnorm(x))
  }
  m
}

Example 2: preallocation of result vector

Just like before, we build a matrix and populate it with a for loop

ok.norm <- function(n,x) {
  m <- matrix(0, nrow=n, ncol=x)
  for (i in 1:n) {
    m[i,] <- rnorm(100)
  }
  m
}

Example 2: use lapply and rbind

don't need to preallocate matrix

lapply.norm <- function(n,x) {
  do.call('rbind', lapply(1:n, function(i) rnorm(x)))
}

Example 2: Compute all rows at once

best.norm <- function(n,x) {
  m <- rnorm(x * n)
  dim(m) <- c(x, n)
  t(m)
}

Example 2: Test data

n <- 600
x <- 100
# Verify correct results
set.seed(123); bad.result <- bad.norm(n,x)
set.seed(123); ok.result <- ok.norm(n,x)
set.seed(123); lapply.result <- lapply.norm(n,x)
set.seed(123); best.result <- best.norm(n,x)

c(identical(bad.result, ok.result),
identical(bad.result, lapply.result),
identical(bad.result, best.result))
[1] TRUE TRUE TRUE

Example 2: benchmarks

res <- microbenchmark(bad=bad.norm(n,x), ok=ok.norm(n,x),
                        lapply=lapply.norm(n,x), best=best.norm(n,x))
kable(summary(res, unit="relative"))
expr min lq mean median uq max neval
bad 11.223780 10.354869 9.635543 10.412343 10.897349 1.6719011 100
ok 1.297873 1.378856 1.213990 1.326724 1.439724 0.1566525 100
lapply 1.359496 1.416385 1.251436 1.369213 1.429904 0.1831840 100
best 1.000000 1.000000 1.000000 1.000000 1.000000 1.0000000 100

Example 2: benchmark plot

autoplot(res)

Just in Time (JIT) compiling your functions

Results in your function as bytecode

fun.for <- function(x, seed=1423) {
  set.seed(seed)
  y <- double(length(x))
  for (i in seq_along(x)) {
    y[i] <- rnorm(1) * x[i]
  }
  y
}
fun.for.compiled <- cmpfun(fun.for)

Benchmarking JIT

x <- 10000
res <- microbenchmark(fun.for=fun.for(x),
                      fun.for.compiled=fun.for.compiled(x))
kable(summary(res, unit="relative"))
expr min lq mean median uq max neval
fun.for 1.000000 1.000000 1.0000000 1.000000 1.000000 1.0000000 100
fun.for.compiled 1.010998 1.001442 0.1391575 1.003836 1.003227 0.0037725 100

Benchmarking JIT

autoplot(res)

Simple parallel computing

Simple parallel computing using mclapply

The standard parallel package includes a very useful function: mclapply

  • mclapply is nearly a drop-in replacement for lapply
  • mc.cores argument specifies number of workers to use
  • does not execute in parallel on Windows (depends on fork system call)
  • not generally safe to use in R GUIs (seems to work in RStudio)

Parallel randomForest example

library(parallel)
library(randomForest)

x <- matrix(runif(500), 100)
y <- gl(2, 50)
ntree <- 1000
cores <- detectCores()
vntree <- rep(ntree %/% cores, cores)
worker <- function(n) randomForest(x, y, ntree=n)
rf <- do.call('combine',
              mclapply(vntree, worker, mc.cores=cores))

Resources